Comparative genome characterization of Echinicola marina sp. nov., isolated from deep-sea sediment provide insight into carotenoid biosynthetic gene cluster evolution

Echinicola, carotenoid-pigmented bacteria, are isolated from various hypersaline environments. Carotenoid accumulation in response to salt stress can stabilize the cell membrane in order to survive. A pink-colored strain SCS 3–6 was isolated from the deep-sea sediment of the South China Sea. Growth was found to occur at 10–45 °C. The strain could tolerate 10% (w/v) NaCl concentration and grow at pH 5–9. The complete genome of SCS 3–6 comprises 5053 putative genes with a total 5,693,670 bp and an average G + C content of 40.11 mol%. The 16S rRNA gene sequence analysis indicated that strain SCS 3–6 was affiliated with the genus Echinicola, with the closely strains were Echinicola arenosa CAU 1574T (98.29%)and Echinicola shivajiensis AK12T (97.98%). For Echinicola species with available genome sequences, pairwise comparisons for average nucleotide identity (ANI) and in silico DNA-DNA hybridization (DDH) revealed ANIb values from 70.77 to 74.71%, ANIm values from 82.72 to 88.88%, and DDH values from 18.00 to 23.40%. To identify their genomic features, we compared their genomes with those of other Echinicola species. Phylogenetic analysis showed that strain SCS 3–6 formed a monophyletic clade. Genomic analysis revealed that strain SCS 3–6 possessed a complete synthetic pathway of carotenoid and speculated that the production was astaxanthin. Based on phenotypic and genotypic analyses in this study, strain SCS 3–6 is considered to represent a novel species of the genus Echinicola for which the name Echinicola marina sp. nov. is proposed. The type strain is SCS 3-6T (= GDMCC 1.2220T = JCM 34403T).


Materials and methods
Sampling sites, enrichment and isolation. Strain SCS 3-6 was isolated from deep-sea sediment sample collected from the South China Sea (depth of 1700 m, E 117°56.2877′, N 20°59.8047′). In enrichment experiments for isolating strain SCS 3-6, 1 g of sediment sample was enriched in 50 ml marine agar (Difco) for 72 h at 28 °C, 150 rpm. Then, 200 μl of enriched solution was transferred to fresh medium and was cultured at 28 °C, 150 rpm for 72 h. And this routine culturing was repeated three times. After, the supernatants of the enriched sample were serially diluted (10 -5 to 10 -7 ) with PBS buffer (KH 2 PO 4 0.2 g, Na 2 HPO 4 ·12H 2 O 2.9 g, NaCl 8 g, KCl 0.2 g, pH 7.0). 100 μl of each diluted sample was spread on marine agar plates and incubated at 28 °C for 48 h.
Morphological, physiological, and biochemical analysis. The morphological characteristics of the strains were investigated after 24 h of incubation on marine agar. The Gram reaction was examined according to Buck's method 26 . Cell morphology was investigated using scanning electron microscope (SU8010, Hitachi, Japan) and transmission electron microscope (H-7650, Hitachi, Tokyo, Japan). Growth was observed at various temperatures (4,15,20,25,28,30,33,37,40,45, and 50 °C) on marine agar. Tolerance to different NaCl concentrations (0-10%, in increments of 1%, w/v, NaCl) and pH range (pH 4.0-11.0, at intervals of 1 unit) were performed at 28 °C, for 7 days. Anaerobic growth was tested in an MGCAnaeroPouch-Anaero (Mitsubishi, Tokyo, Japan) at 28 °C for 7 days on marine agar plates. Catalase and oxidase activities were investigated in 3% (v/v) H 2 O 2 and using commercial strips (Huankai, Guangzhou, China) according to the manufacturer's instruction, respectively. Additional enzyme activities and carbon source utilization assays were examined by using API 20NE, API ZYM (bioMerieux, Marcy-l′Etoile, French) and Biolog plates kits (Hayward, CA, USA), respectively, following the manufacturer's instruction.
Chemotaxonomic analysis. For analysis of the chemotaxonomic features of strain SCS 3-6, a series of experiments 27 were carried out to determine the content of the respiratory quinones, polar lipids, and fatty acids of closely related type strains (Echinicola shivajiensis JCM 17847 T and Echinicola sediminis KCTC 52495 T ) and SCS 3-6 with cell biomass obtained from cultures grown in marine agar (Difco) for 2 days at 28 °C. Respiratory quinones were extracted from freeze-dried cells (100 mg) with chloroform/methanol (2:1) and analyzed via the HPLC system 28 . Polar lipids were extracted by using a chloroform/methanol/water system 29 and separated by two-dimensional TLC. The plate dotted with the sample was subjected to two-dimensional development, with the first solvent of chloroform/methanol/water (65:25:4, by vol.) followed by the second solvent of chloroform/ methanol/acetic acid/water (85:12:15:4, by vol.). The polar lipids were identified by spraying with phosphomolybdate, ninhydrin, Dragendorff 's reagents, molybdenum blue, 1-methylnaphthol, respectively. For cellular fatty acid analysis, fatty acids were saponified, methylated, and extracted according to the Microbial Identification System (MIDI) protocol. The fatty acid methyl eaters, analysed with a gas chromatograph (7890, Hewlett Packard), were identified by the Microbial Identification software package, based on the Sherlock Aerobic Bacterial Database (TSBA6) 30 .
Sequencing and phylogenetic analyses. Amplification of the 16S rRNA gene was performed by PCR with two primers: 27F and 1492R 31 . The amplified gene was ligated into the pJET1.2/Blunt Vector (Thermo Scientific, Waltham, MA, USA) and sequenced by Sangon Biotech (Shanghai, China). The 16S rRNA gene sequences used EzBioCloud's Identify services 32 to get sequences informations. The phylogenetic tree was constructed using the ClustalW algorithm from the MEGA 11 33 software package using the neighbor-joining (NJ) and maximun likelihood (ML) methods followed by bootstrap analysis with 1000 bootstrap replications 34  www.nature.com/scientificreports/ and grown for 24 h at 28 °C, 200 rpm. Then, bacteria were washed in 1 × PBS and collected by centrifugation at 5000 rpm for 10 min at 4 °C. The genome of SCS 3-6 was extracted and sequence by Majorbio Bio-pharm Technology Co., Ltd (Shanghai, Chain) on PacBio and illumina Hiseq × 10 platform. A high-quality data set with a corresponding sequencing depth of 100-fold was generated. The scan map of the bacterial genome is created using SOAPdenovo2 35 and the complete map of the bacterial genome is assembled using Canu and SPAdes 36 . Glimmer and GeneMarkS 37 were used to predict coding sequences (CDS) and plasmid genes, respectively. tRNA and rRNA were predicted using tRNAscan-SE v2.0 and Barrnap, respectively. Function annotation of SCS 3-6 was obtained from Non-Redundant Protein (NR), Swiss-Prot, Pfam, Clusters of Orthologous Group (COG), Gene Ontology and Kyoto Encyclopedia of Genes databases using BLASTp and the same BLAST thresholds 38 . Additionally, the CAZymes were identified, classified and annotated using CAZy database 39 . The gene clusters of secondary metabolites were identified by antiSMASH program 40 . The average nucleotide identity (ANI) was calculated using the BLAST (ANIb) and MUMmer (ANIm) algorithms (http:// jspec ies. riboh ost. com/ jspec iesws/) 41 DNA-DNA hybridization (DDH) was calculated according to the method described by Meier-Kolthoff et al. 42 . The BPGA pipeline 43 was used to perform model extrapolations of the Echinicola pangenome/core genome by applying default parameters.
The assay of its carotenoid production ability. The strains were inoculated into 100 mL marine agar, cultured at 28 °C for 72 h, centrifuged, and the bacteria were collected. Add 3 mL sterile water to the tube and wash the bacteria. Transfer the bacteria to a 15 mL centrifuge tube. Centrifuge at 4 °C, 8000×g for 10 min, then discard supernatant. The bacteria were resuspended with 2 mL acetone solution and extracted by shock for 10 min. After centrifugation at 8000×g for 10 min, the supernatant extract was transferred to a new tube. 2 mL ethyl acetate solution was added to the original tube, and the extraction was continued with oscillation for 10 min. After centrifugation at 8000×g for 10 min, the supernatant extract was mixed with the previous extract, and 3 mL sterile water was added to it, and the mixture was shaken and mixed. Centrifuge 8000×g for 10 min to delaminate the liquid and draw the upper liquid into a new EP tube. The ethyl acetate was dried to obtain powder and was redissolved with methanol. Over 0.22 µm filter membrane, used for High-pressure liquid chromatography (Shimadzu HPLC LC-20AT) analysis and detection. The tests were by C18 column with a mobile phase of methanol-acetonitrile-water (volume ratio being 80:15:5). UV detection was performed at 478 nm.
Ethical approval. This article does not contain any studies with human participants or animals performed by any of the authors.

Results and discussion
Phylogenetic characteristics. The search of 16S rRNA gene sequence against the EzTaxon database revealed that strain SCS 3-6 belongs to Bacteroidetes phylum. The nearly complete 16S rRNA gene sequence (1518 bp) of strain SCS 3-6 was compared with other strains with the top 30 sequence similarity using phylogenetic tree analysis. Strain SCS 3-6 was 98.29% similar to Echinicola arenosa CAU 1574 T , 97.98% with Echinicola shivajiensis AK12 T , 96.97% with Echinicola sediminis 001-Na2 T , and 96.22% with Echinicola jeungdonensis HMD3054 T . The other species of the genus Echinicola showed < 96% sequence similarities with strain SCS 3-6. The NJ phylogenetic tree revealed strain SCS 3-6 clustered with members of the genus Echinicola and formed a monophyletic clade with Echinicola arenosa CAU 1574 T (Fig. 1).
Phenotypic properties. Cells of strain SCS 3-6 were Gram-stain negative, non-motile by gliding, and rod-shaped (0.2-0.3 μm in width and 1.5-2.0 μm in length) (Fig. 2). Colonies of this strain were circular, convex and pink pigmented after 2-3 days of incubation 30 °C on marine agar plates. Strain SCS 3-6 were facultative anaerobes. The strain SCS 3-6 was capable of growth at temperatures between 10 and 45 °C and the strain grew well at pH values between 5.0 and 9.0. The strain SCS 3-6 was tolerant to 10% (w/v) NaCl.
General genome features and genetic relatedness. The complete genome of strain SCS 3-6 using the circus program 44 to visualize contained a single circular chromosome of 5,693,670 bp with a guanine-cytosine (GC) content of 40.11 mol% (Fig. 4). No plasmids were present in the strain SCS 3-6 genome. In total, 5053 coding sequence regions, 41 tRNA genes, and 5 sets of rRNA genes (5S,16S, and 23S rRNA genes) were respectively predicted (Supplementary Table S2). The GC content of strain SCS 3-6 was the lowest among all strains of current Echinicola strains.
ANIb, ANIm, dDDH, and OrthoANI values were calculated to identify the genomic similarities of strain SCS 3-6 to the six strains of Echinicola species with available genome sequences. ANIb, ANIm, and dDDH values are presented in Table 2 and OrthoANI values are shown in Supplementary Fig. S1 Core genome and pangenome of Echinicola. The genome of strain SCS 3-6 was compared to the available genomes of other Echinicola strains. The core genome sequences of individual strain were calculated.   www.nature.com/scientificreports/ Functional COG annotation revealed that the core genome had a higher proportion of genes classified in COG categories J (translation, ribosomal structure, and biogenesis), E (amino acid transport and metabolism), F (nucleotide transport and metabolism), H (coenzyme transport and metabolism), and I (lipid transport and metabolism), which all were associated with basic biological functions and sustained life activities. The accessory genome and strain-specific genes were biased toward COG categories T (signal transduction mechanisms), K (transcription), and P (inorganic ion transport and metabolism) (Fig. 5D), which all were about informatics metabolism and were probably related to the adaption of Echinicola to various extreme habitats such as saline environments to accommodate their lifestyles (Supplementary File).

Polysaccharide utilization in the genome of E. marina. Marine
Bacteroidetes are well known for their functional specialization on the decomposotion of polysaccharides which results from a great number of carbohydrate-active enzymes 47 . Therefore, marine bacteria may provide the most common CAZymes resources for polysaccharides degradation. The genome of SCS 3-6 harbors 299 CAZymes (Fig. 6A), including Glyco-  . The largest family found in the SCS 3-6 genome was the GH, which encoded 155 genes (Fig. 6A). The putative genes encoding GHs belonged to 43 different families with gene numbers ranging between 1 and 26, in which GH43 genes was highest. Generally, GH enzymes have great potential to hydrolyze complex carbohydrates and they are considered the key enzymes involved in carbohydrate metabolism. And GHs can degrade the most abundant biomasses. A deeper analysis toward differentiation of GHs revealed SCS 3-6 can degrade xylan. GH43 are classified based on their mode of action and substrate preference into xylanases, xylosidases, arabinofuranosidase, arabinosidase and others in SCS 3-6 genome, indicating that SCS 3-6 could have the capacity of xylan utilization. In addition, numerous genes assigned to other GH families involved in the degradation of xylan were detected including three xylanases from GH10, one xylanase from GH30, two xylosidases from GH31 and two arabinofuranosidases from GH51. Additionally, two GH115 genes predicted in SCS 3-6 can cleave glucuronic acid side chains from native xylans. PULs were manually detected based on the presence of CAZyme clusters. Some xylanolytic enzymes of E. marina SCS 3-6 were located on the multi-gene polysaccharide utilization loci (PUL), including genes that encode xynA-encoding xylanases (GH10), beta-xylosidase (GH43), alpha-glucuronidase (GH115) and one carbohydrate-binding modules (CBM6) (Fig. 6B). The xylan PUL contained genes the encode a susC-susD system and was similar to that in E. rosea JT3085 T22 . Xylans are heteropolymers containing xylose, arabinose, glucuronic acid, galactose and other residues. Therefore, the xylan PUL included abfA-encoding alpha-N-arabinofurnosidases (GH43) and afcA-encoding alpha-l-fucosidase (GH65). And some auxiliary activity enzymes (two esterase genes and one gene-encoding sialate O-acetylesterase) were involved in xylan degradation.

Carotenoids biosynthesis capability of E. marina. Through genome annotation, a gene cluster was
shown to be directly involved in the synthesis of carotenoids. Based on these genes, crtW, crtY, crtB, crtI and crtZ (Fig. 7C), we speculated on the synthetic pathway of carotenoid production of E. marina SCS 3-6 and the production may be astaxanthin (Fig. 7A). Geranylgeranyl diphosphate (GGPP) is the direct precursor for carotenoid biosynthesis 48 . Phytoene is the first carotenoid formed in the bacterial carotenoid biosynthesis pathway and it is formed from the condensation of two molecules of GGPP 49 by phytoene synthase (CrtB). Second, the four desaturation steps from phytoene to lycopene were mediated by a single enzyme, CrtI 50 . Following synthesis of lycopene, a large variety of carotenoids were produced by different lycopene cyclase processes. We found a gene of the strain genome was annotated as lycopene cyclase (CrtY). Thus, lycopene was converted to β-carotene by lycopene cyclase. The final synthesis of astaxanthin from β-carotene is a metabolic web containing several branches, according to the different participation steps and orders of β-carotene ketolase (CrtW) and β-carotene hydroxylase (CrtZ) 51 . Therefore, based on the presence of the crt gene in the genome, we speculated that the Table 2. ANIb, ANIm, and dDDH values between pairs of type strains of Echinicola species. * represents the same strain.  www.nature.com/scientificreports/ synthetic carotenoid may be astaxanthin. Further, the standard of astaxanthin and carotenoids extracted from the strain SCS 3-6 were analyzed by high-pressure liquid chromatography (HPLC), and the results supported the speculation (Fig. 7B).
Echinicola strains are usually orange, pink or yellow due to the presence of carotenoids pigments. In E. marina SCS 3-6 the pink pigment is cell bound and was extracted using organic solvents. To investigate the evolution of crt gene cluster, we predicted and performed a comparison of the carotenoid synthetic clusters in SCS 3-6 with other related strains (Fig. 7C). Carotenoid biosynthesis gene clusters have been found in Echinicola strains. In the genome of E. marina SCS 3-6, the cluster contained five carotenogenic genes (crtW, crtZ, crtY, crtB, and crtI) with the same orientation, one stress-responsive sigma factor gene (rpoE), one gene (merR) encoding transcriptional regulator, one gene (ispH) encoding an enzyme in the methyl erythritol phosphate pathway of isoprenoid synthesis, and one gene (oma87) encoding an outer membrane protein. Based on the comparative

Conclusion
Based on the results of phenotypic and genotypic analyses, strain SCS 3-6 is a novel species belongs to the genus Echinicola. SCS 3-6 was distinguished from closely related Echinicola species according to the phylogenetic tree constructed using 16S rRNA gene sequences and core genome sequences. We firstly research the relationships between members of the genus Echinicola. Analysis of the complete genome of SCS 3-6 indicated its polysaccharides degradation ability and carotenoid production ability.
Description of Echinicola marina sp. nov.. Echinicola marina (ma.ri'na L. fem. adj. marina, of the sea, marine). Echinicola marina SCS 3-6 is facultative anaerobes, Gram-stain negative, non-motile by gliding. Cells are rod-shaped, and they have a width between 0.2 and 0.3 μm and a length between 1.5 and 2.0 μm. Colonies of this strain were circular, convex and pink pigmented after 2-3 days of incubation 30 °C on marine agar. Strain SCS 3-6 has exhibited positive catalase, oxidase and the hydrolysis of gelatin and casein, but negative for starch and tween 80. Strain SCS 3-6 can use various carbon sources: d-maltose, d-trehalose, d-cellobiose, gentiobiose, sucrose,